Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions
نویسندگان
چکیده
In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. Specifically, we investigate the empirical convergence of GLL to the true local neighborhood as a function of sample size. Moreover, we study how predictivity improves with increasing sample size. Then we investigate how sensitive are the algorithms to multiple statistical testing, especially in the presence of many irrelevant features. Next we discuss the role of the algorithm parameters and also show that Markov blanket and causal graph concepts can be used to understand deviations from optimality of state-of-the-art non-causal algorithms. The present paper also introduces the following extensions to the core GLL framework: parallel and distributed versions of GLL algorithms, versions with false discovery rate control, strategies for constructing novel heuristics for specific domains, and divide-and-conquer local-to-global learning (LGL) strategies. We test the generality of the LGL approach by deriving a novel LGL-based algorithm that compares favorably c ©2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. ALIFERIS, STATNIKOV, TSAMARDINOS, MANI AND KOUTSOUKOS to the state-of-the-art global learning algorithms. In addition, we investigate the use of non-causal feature selection methods to facilitate global learning. Open problems and future research paths related to local and local-to-global causal learning are discussed.
منابع مشابه
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation
We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways...
متن کاملA Unified View of Causal and Non-causal Feature Selection
In this paper, we unify causal and non-causal feature selection methods based on the Bayesian network framework. We first show that the objectives of causal and non-causal feature selection methods are equal and are to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We demonstrate that causal and non-causal feature selection take different...
متن کاملAlgorithms for Large Scale Markov Blanket Discovery
This paper presents a number of new algorithms for discovering the Markov Blanket of a target variable T from training data. The Markov Blanket can be used for variable selection for classification, for causal discovery, and for Bayesian Network learning. We introduce a low-order polynomial algorithm and several variants that soundly induce the Markov Blanket under certain broad conditions in d...
متن کاملLearning Markov Blankets for Continuous or Discrete Networks via Feature Selection
Markov Blankets discovery algorithms are important for learning a Bayesian network structure. We present an argument that tree ensemble masking measures can provide an approximate Markov blanket. Then an ensemble feature selection method is used to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). We compare our algorithm in the causal stru...
متن کاملUsing Markov Blankets for Causal Structure Learning
We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children’s parents (or spouses), also known as the Markov blanket of X . Identify...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 11 شماره
صفحات -
تاریخ انتشار 2010